Model-based clustering of microarray expression data via latent Gaussian mixture models
نویسندگان
چکیده
MOTIVATION In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info
منابع مشابه
Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model
This chapter describes a clustering procedure for microarray expression data based on a well-defined statistical model, specifically, a conjugate Dirichlet process mixture model. The clustering algorithm groups genes whose latent variables governing expression are equal, that is, genes belonging to the same mixture component. The model is fit with Markov chain Monte Carlo and the computational ...
متن کاملClustering time-course Microarray data using functional Bayesian infinite mixture model
This paper presents a new Bayesian, infinite mixture model based, clustering approach specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a nonparametric regression ...
متن کاملNonparametric Bayesian Clustering via Infinite Warped Mixture Models
We introduce a flexible class of mixture models for clustering and density estimation. Our model allows clustering of non-linearly-separable data, produces a potentially low-dimensional latent representation, automatically infers the number of clusters, and produces a density estimate. Our approach makes use of two tools from Bayesian nonparametrics: a Dirichlet process mixture model to allow a...
متن کاملVariable selection in clustering via Dirichlet process mixture models
The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a model-based method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure...
متن کاملVisualisation of Reduced-Dimension Microarray Data Using Gaussian Mixture Models
Dimensionality reduction, clustering and visualisation methods proposed in recent years have afforded new possibilities for the analysis of gene expression data. However, efficient, novel techniques for processing and representing microarray data are still required. We propose the use of the discrete cosine and sine transformations for dimensionality reduction of microarray data. These techniqu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 26 21 شماره
صفحات -
تاریخ انتشار 2010